Sphyrna lewini

BOLDGenotyper Analysis Report
Generated: 2025-11-20 15:43:27
617
Total Samples
10
Genotypes
96.3%
Assignment Rate

Analysis Summary

Total Samples
617
Samples analyzed
With Sequence Data
598
96.9% of total
Consensus Groups
10
Unique genotypes identified
Species Identified
1
Distinct species
Successfully Assigned
594
96.3% assignment rate
Mean Identity
98.7%
For assigned samples
Ocean Basins
7
Geographic coverage

Pipeline Parameters

The following parameters were used for this analysis:

šŸ“ Full parameters available at: {organism}_pipeline_parameters.json

ParameterValueDescription
Clustering Threshold 0.015 (98.5% identity) Maximum genetic distance for grouping sequences into consensus genotypes. Lower values create more groups with tighter genetic similarity.
Similarity Threshold 0.5 (50% identity) Minimum sequence identity required for assigning samples to genotypes. Samples below this threshold are marked as unassigned.
Tie Margin 0.001 (0.1% difference) Maximum identity difference between top matches to flag as ambiguous. Samples with (best - runner-up) < tie margin are flagged for manual review.
Tie Threshold 0.95 (95% identity) Minimum best-match identity required to consider tie detection. Prevents flagging low-quality matches as ties.
Threads 4 Number of parallel processing threads used.
Phylogenetic Tree True Whether phylogenetic tree was constructed.

Visualizations

šŸ“ High-resolution images (PNG/PDF) and data files (JSON/CSV) available in output directories (see individual plots for locations)

Identity Distribution

Distribution of sequence identity scores for assigned samples

šŸ“ Files: genotype_assignments/Sphyrna lewini_identity_distribution.png, Sphyrna lewini_identity_distribution.pdf

Identity Distribution

Phylogenetic Tree

Phylogenetic tree showing relationships between consensus groups

šŸ“ Files: phylogenetic/Sphyrna lewini_tree.png, Sphyrna lewini_tree.pdf

šŸ’” Newick tree file available at phylogenetic/Sphyrna lewini_tree_relabeled.nwk for opening in tree editors such as TreeViewer (Bianchini & SĆ”nchez-Baracaldo, 2024) for re-rooting and customization

Phylogenetic Tree

Relative Abundance by Ocean Basin

Relative abundance of genotypes across ocean basins

šŸ“ Files: visualization/Sphyrna lewini_distribution_bar.png, Sphyrna lewini_distribution_bar.pdf, Sphyrna lewini_distribution_bar_data.json

Relative Abundance by Ocean Basin

Total Abundance by Ocean Basin

Total sample counts of genotypes across ocean basins

šŸ“ Files: visualization/Sphyrna lewini_totaldistribution_bar.png, Sphyrna lewini_totaldistribution_bar.pdf, Sphyrna lewini_totaldistribution_bar_data.json

Total Abundance by Ocean Basin

Total Abundance by Ocean Basin (Faceted)

Total sample counts faceted by species or genotype

šŸ“ Files: visualization/Sphyrna lewini_distribution_bar_faceted.png, Sphyrna lewini_distribution_bar_faceted.pdf, Sphyrna lewini_distribution_bar_faceted_data.json

Total Abundance by Ocean Basin (Faceted)

Distribution Map

Geographic distribution of samples

šŸ“ Files: visualization/Sphyrna lewini_distribution_map.png, Sphyrna lewini_distribution_map.pdf, Sphyrna lewini_distribution_map_data.json

Distribution Map

Distribution Map (Faceted)

Geographic distribution faceted by species or genotype

šŸ“ Files: visualization/Sphyrna lewini_distribution_map_faceted.png, Sphyrna lewini_distribution_map_faceted.pdf

Distribution Map (Faceted)

Methods

Comprehensive analysis methodology and parameters suitable for reporting in peer-reviewed publications.

1. Analysis Overview

Pipeline VersionBOLDGenotyper v1.0.0
Analysis Date2025-11-20
Input Filedata/Sphyrna_lewini_scallopedhammerhead.tsv

2. Sample Processing & Quality Control

Total samples loaded685
Duplicate samples removed10
Unique samples after deduplication675
Coordinate quality filtering617/675 (91.4%) retained
Centroid coordinates excluded58
Valid sequences for analysis598
Missing/invalid sequences excluded19
Sequences too short after trimming3

3. Sequence Dereplication

AlgorithmHierarchical clustering (average linkage)
AlignmentMAFFT --auto --thread 4
TrimmingtrimAl
Minimum sequence length400 bp
Clustering threshold0.015 (98.5% identity)
Pairwise comparisons176,715
Consensus genotypes identified10

4. Genotype Assignment

Assignment methodEdit distance (edlib)
Similarity threshold0.5 (50% identity)
Tie detection margin0.001 (0.1%)
Tie detection threshold0.95 (95% identity)
Successfully assigned595/617 (96.4%)
Unassigned (no sequence)19
Unassigned (below threshold)3
Ambiguous assignments (ties)1
Low confidence assignments0

5. Phylogenetic Analysis

AlignmentMAFFT --auto --thread 4
Tree inferenceFastTree
FastTree parameters-nt -gtr -gamma
Substitution modelGTR+Gamma
Number of taxa10

6. Geographic Analysis

Reference datasetGOaS v1 (Global Oceans and Seas)
Samples with coordinates171/617 (27.7%)
Ocean basin assignments71/171 (41.5%)
Outside known basins100
Unknown location546 samples

7. Software & Dependencies

BOLDGenotyperv1.0.0
MAFFTMultiple sequence alignment (Katoh & Standley, 2013)
trimAlAlignment trimming (Capella-GutiƩrrez et al., 2009)
FastTreePhylogenetic inference (Price et al., 2010)
edlibEdit distance calculation (Å oÅ”ić & Å ikić, 2017)
GOaSGlobal Oceans and Seas dataset (Flanders Marine Institute, 2021)

8. Methods Statement

DNA barcode sequences for Sphyrna lewini were downloaded from the Barcode of Life Data System (BOLD; Ratnasingham & Hebert, 2007) and processed using BOLDGenotyper v1.0.0. A total of 598 sequences were analyzed after removing duplicates and filtering for sequence quality (minimum length: 400 bp). Sequences were aligned using MAFFT (Katoh & Standley, 2013) and trimmed with trimAl (Capella-GutiƩrrez et al., 2009). Consensus genotypes were identified through hierarchical clustering at 98% sequence identity using average linkage. Individual sequences were assigned to genotypes using edit distance calculations (minimum identity: 50%). A phylogenetic tree was constructed using FastTree (Price et al., 2010) with the GTR+Gamma substitution model. Geographic distributions were mapped using coordinates provided in BOLD and assigned to ocean basins using the Global Oceans and Seas (GOaS) v1 dataset (Flanders Marine Institute, 2021). 595 sequences (96.4%) were successfully assigned to 10 consensus genotypes.

9. References

Bianchini, G., & SƔnchez-Baracaldo, P. (2024). TreeViewer: Flexible, modular software to visualise and manipulate phylogenetic trees. Ecology and Evolution, 14, e10873. https://doi.org/10.1002/ece3.10873

Capella-Gutiérrez, S., Silla-Martínez, J. M., & Gabaldón, T. (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics, 25(15), 1972-1973.

Flanders Marine Institute (2021). Global Oceans and Seas, version 1. Available online at https://www.marineregions.org/

Katoh, K., & Standley, D. M. (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular Biology and Evolution, 30(4), 772-780.

Price, M. N., Dehal, P. S., & Arkin, A. P. (2010). FastTree 2 – Approximately maximum-likelihood trees for large alignments. PLoS ONE, 5(3), e9490.

Ratnasingham, S., & Hebert, P. D. (2007). BOLD: The Barcode of Life Data System. Molecular Ecology Notes, 7(3), 355-364.

Å oÅ”ić, M., & Å ikić, M. (2017). Edlib: a C/C++ library for fast, exact sequence alignment using edit distance. Bioinformatics, 33(9), 1394-1395.

Genotype Assignment Results

šŸ“ Full assignment data available at: reports/{organism}_assignment_summary.csv and genotype_assignments/{organism}_diagnostics.csv

Assignment Status Breakdown

StatusCountPercentage
Successfully Assigned59496.3%
Low Confidence00.0%
Tied Assignment10.2%
Below Threshold30.5%
No Sequence Data193.1%

Identity Score Statistics (Assigned Samples)

MetricValue
Mean Identity98.66%
Median Identity99.69%
Minimum Identity76.61%
Maximum Identity100.00%

Taxonomy

Consensus Group Taxonomy

šŸ“ Full table available at: taxonomy/{organism}_consensus_taxonomy.csv

consensus_group assigned_sp assignment_level assignment_notes majority_fraction
consensus_c10_n1 Sphyrna lewini species majority 1.00 1.0
consensus_c1_n211 Sphyrna lewini species majority 1.00 1.0
consensus_c2_n1 Sphyrna lewini species majority 1.00 1.0
consensus_c3_n146 Sphyrna lewini species majority 1.00 1.0
consensus_c4_n1 Sphyrna lewini species majority 1.00 1.0
consensus_c5_n1 Sphyrna lewini species majority 1.00 1.0
consensus_c6_n5 Sphyrna lewini species majority 1.00 1.0
consensus_c7_n101 Sphyrna lewini species majority 1.00 1.0
consensus_c8_n127 Sphyrna lewini species majority 1.00 1.0
consensus_c9_n1 Sphyrna lewini species majority 1.00 1.0

Species Composition by Consensus Group

šŸ“ Full table available at: taxonomy/{organism}_species_by_consensus.csv

consensus_group reported_species n frac n_in_group
consensus_c10_n1 Sphyrna lewini 1 1.0 1.0
consensus_c1_n211 Sphyrna lewini 211 1.0 211.0
consensus_c2_n1 Sphyrna lewini 1 1.0 1.0
consensus_c3_n146 Sphyrna lewini 146 1.0 146.0
consensus_c4_n1 Sphyrna lewini 1 1.0 1.0
consensus_c5_n1 Sphyrna lewini 1 1.0 1.0
consensus_c6_n5 Sphyrna lewini 5 1.0 5.0
consensus_c7_n101 Sphyrna lewini 101 1.0 101.0
consensus_c8_n127 Sphyrna lewini 127 1.0 127.0
consensus_c9_n1 Sphyrna lewini 1 1.0 1.0
NaN Sphyrna lewini 22 NaN NaN

Geographic Distribution

šŸ“ Full annotated dataset with geographic data available at: {organism}_annotated.csv

Geographic Analysis Summary: 71 of 617 samples (11.5%) have defined geographic locations. 546 samples (88.5%) with unknown or missing geography were excluded from geographic analyses.

Sample Distribution by Ocean Basin

Ocean Basin Sample Count Percentage
South China and Easter Archipelagic Seas 39 54.9
Indian Ocean 21 29.6
South Atlantic Ocean 4 5.6
North Atlantic Ocean 3 4.2
South Pacific Ocean 3 4.2
North Pacific Ocean 1 1.4

Genotypes per Ocean Basin

ocean_basin consensus_group Indian Ocean North Atlantic Ocean North Pacific Ocean South Atlantic Ocean South China and Easter Archipelagic Seas South Pacific Ocean
consensus_c10_n1 0 1 0 0 0 0
consensus_c1_n211 17 0 0 3 2 0
consensus_c3_n146 3 0 0 1 0 1
consensus_c6_n5 0 2 0 0 0 0
consensus_c7_n101 1 0 1 0 8 2
consensus_c8_n127 0 0 0 0 29 0
Missing Geography: 446 samples (72.3%) do not have valid coordinate data.